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Clostridium difficile is a major problem as an aetiological 
agent for antibiotic-associated diarrhoea. The mechanism by 
which the bacterium colonizes the gut during infection is 
poorly understood, but undoubtedly involves a myriad of 
components present on the bacterial surface. The mechanism 
of C. difficile surface-layer (S-layer) biogenesis is also largely 
unknown but involves the post-translational cleavage of a 
single polypeptide (surface-layer protein A; SlpA) into low- 
and high-molecular-weight subunits by Cwp84, a surface- 
located cysteine protease. Here, the first crystal structure of 
the surface protein Cwp84 is described at 1.4 A resolution and 
the key structural components are identified. The truncated 
Cwp84 active-site mutant (amino-acid residues 33-497; 
C116A) exhibits three regions: a cleavable propeptide and a 
cysteine protease domain which exhibits a cathepsin L-like 
fold followed by a newly identified putative carbohydrate- 
binding domain with a bound calcium ion, which is referred 
to here as a lectin-like domain. This study thus provides the 
first structural insights into Cwp84 and a strong base to 
elucidate its role in the C. difficile S-layer maturation 
mechanism. 

1. Introduction 

Disruption of the normally protective gut flora results in the 
extensive colonization and growth of Clostridium difficile 
(Guarner & Malagelada, 2003), a predominantly nosocomially 
acquired Gram-positive, spore-forming bacterium. C. difficile 
infection (CDI) can lead to severe diarrhoea, pseudo- 
membranous colitis, toxic megacolon and ultimately death 
(Kachrimanidou & Malisiovas, 2011; Rupnik et al, 2009). In 
recent years, CDI has become a global burden both medically 
and economically (Bouza, 2012; Dubberke & Olsen, 2012). 

C. difficile expresses a self-assembling paracrystalline 
protein array on its outermost surface, known as an S-layer. 
The S-layer is largely derived from the post-translational 
cleavage of a single polypeptide (surface-layer protein A; 
SlpA) into low- and high-molecular-weight subunits (LMW 
SLP and HMW SLP, respectively) by Cwp84, a surface-located 
cysteine protease (Calabi et ai, 2001; Cerquetti et al, 2000; 
Karjalainen et al, 2001; Kirby et al, 2009). 

The HMW SLP contains three putative cell-wall binding/ 
anchoring domains (CWBDs; Pfam 04122) which are thought 
to mediate noncovalent binding to the bacterial cell surface 
via a currently unknown mechanism. A total of 28 S-layer 
paralogues, including Cwp84, containing three Pfam 04122 
repeats at either the N-terminal or C-terminus with a 'func- 
tional' domain at the other end, have been identified in the 
C. difficile genome (Calabi et al, 2001; Fagan et al, 2011; 
Monot et al, 2011; Sebaihia et al, 2006). 
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A number of these putative surface proteins have been 
found to play key roles in cell physiology and adhesion (Kirby 
et al, 2009; Reynolds et al, 2011; Waligora et al, 2001), and 
have been demonstrated to illicit an immune response in vivo 
during infection (Wright et al, 2008). Using the ClosTron 
gene-knockout system, we have demonstrated that a number 
of C. difficile surface-associated genes containing Pfam 04122 
repeats may play a role in adhesion in vitro and may also affect 
the release of the potent C. difficile toxins, particularly Cwp84 
(Kirby et al, unpublished work). 

Cwp84 (cell-wall protein ~84 kDa) is an 803-residue 
surface-associated protein containing a cysteine protease 
domain at the N-terminus, a linker region of roughly 170 
residues of unknown function and three Pfam 04122 repeats 
(Fig. la; Janoir et al, 2004, 2007). Cwp84 has been shown to be 
responsible for the maturation of the SlpA precursor protein 
(Dang et al, 2010; de la Riva et al, 2011; Kirby et al, 2009) and 
has also been implicated in the degradation of extracellular 
matrix proteins such as fibronectin, laminin and vitronectin 
(Janoir et al, 2007). 

Despite the key role played by Cwp84 in S-layer biogenesis, 
it has been reported that neither chemical inhibition of Cwp84 
(Dang et al, 2010) nor inactivation of the cwp84 gene (de la 
Riva et al, 2011; Kirby et al, 2009) is bactericidal, although 
severe growth defects were seen in both cases. These results 
indicate that correct maturation of SlpA by Cwp84 is vital to 
maintain healthy bacterial cells; perturbing this process may 
therefore affect the ability of the bacterium to thrive in vivo 



and thus compete with other bacterial species in certain 
environments, such as in the complex microbiome of the 
intestine. Nevertheless, in a hamster model of acute infection 
we previously showed that a cwp84 knockout strain of 
C. difficile was not attenuated for virulence and suggested that 
endogenous proteases within the intestinal tract may artifi- 
cially mature/cleave SlpA (Kirby et al, 2009). However, our 
unpublished observations suggest that C. difficile toxin release 
is altered in the cwp84 mutant, which may negate severe 
growth defects (Kirby et al, unpublished work). Even so, it has 
been speculated that the interruption of S-layer biogenesis 
may make the bacterium more susceptible to antibiotics 
(Dang et al, 2010). This makes Cwp84 a potential target for 
novel prophylactic or therapeutic drugs against CDI, the 
development of which would be guided by structural analyses 
of the protein. 

Cwp84 is a member of the CIA cysteine protease family 
(Rawlings et al, 2010), also known as papain proteases, with a 
putative catalytic dyad comprising of residues Cysll6 and 
His262, aided by Asn287 (Savariau-Lacomme et al, 2003). 
Recently, Dang and coworkers showed that Cwp84 containing 
the substitution Cysll6Ala did not cleave SlpA in an 
Escherichia co//-based co-expression assay, confirming that 
Cysll6 is a catalytically important residue (Dang et al, 2010). 
Papain peptidases are typically composed of an N-terminal 
signal peptide, a propeptide and the catalytic domain. After 
the removal of the signal peptide by a signal peptidase, the 
proenzyme often (but not always; Dahl et al, 2001; Nagler 



1 32 



92 



328 



■i 507 595 



702 



^33 91 



335 



Fragment crystallized 




497, 



604 



697 



793 



803 




1; 



v 



Figure 1 



(c) 



(a) Domain structure of full-length Cwp84. The domains are indicated as follows: signal peptide, grey; propeptide, red; cysteine protease, green; lectin- 
like, cyan; CWBDs, purple. Active-site residues are indicated in pink, while calcium ion-coordinating residues are shown in orange. The region 
crystallized, consisting of residues 33-497, is bracketed below, (b) Ribbon diagram of the three-dimensional structure of the propeptide, cysteine 
protease and lectin-like domains. The domains are coloured according to (a) and the calcium ion is represented as an orange sphere. The disordered 
region between Lys81 and Tyr89 can be observed as a discontinuity in the ribbon at the bottom centre of the image, (c) Molecular surface of Cwp84 3 3^9 7 . 
The close interaction of the propeptide with the cysteine protease and lectin-like domains is shown, particularly at the active site formed at the interface 
between the cysteine protease and lectin-like domains. The domains are coloured according to (a). 
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et ah, 1999) undergoes self-cleavage, removing the proregion 
and generating the mature, active enzyme (Beton et ah, 2012; 
ChapetonMontes et ah, 2011). It has been proposed that 
the propeptide ensures the correct folding of the protein 
(ChapetonMontes et ah, 2011). A recent study by de la Riva 
and coworkers showed that Cwp84 is produced as an inactive 
proenzyme and is processed into the active enzyme of 77 kDa 
by removal of the signal peptide and proregion up to Ser92 
and that this activation step is unlikely to be autocatalytic (de 
la Riva et ah, 2011). 

Despite adherence and subsequent colonization by 
C. difficile representing key milestones in infection, there are 
considerable gaps, particularly with regard to structural data, 
in the understanding of how the surface proteins of C. difficile 
interact with each other and their environment. To date, there 
has only been one previous report of structural information 
for a C. difficile surface protein, which presented the crystal 
structure of an N-terminal fragment of the low-molecular- 
weight subunit of the S-layer at 2.4 A resolution (PDB entry 
3cvz) and structures based on solution-scattering (SAXS) 
experiments of both full-length LMW SLP and the complex 
formed by LMW SLP and HMW SLP (Fagan et ah, 2009). 

To further the understanding of C. difficile S-layer biogen- 
esis, we report a high-resolution (1.4 A) crystal structure of the 
N-terminal cysteine protease domain of Cwp84. Interestingly, 
the hitherto uncharacterized 170-residue 'linker' region 
between the cysteine protease domain and putative location 
of the first Pfam 04122 repeat exhibits a lectin-like domain 
structure with a bound calcium ion. 

2. Materials and methods 

2.1. Protein expression and purification 

A synthetically synthesized gene encoding C. difficile 
Cwp84 residues 33-497 (from strain QCD32g-58; ribotype 
027) with a C116A mutation (an inactive mutant; Life Tech- 
nologies GeneArt Ltd) was cloned by PCR into the GST 
expression vector pGEX-6P-l. The mutation was introduced 
to potentially circumvent problems with poor expression and 
degradation or problems with purification (based on initial 
trials with multiple constructs designed without the mutation). 
Of the two constructs produced with the mutation, neither had 
the problems discussed above and one was purified to near- 
homogeneity in one step (see below). The structure presented 
in this manuscript made use of this particular construct. 

The gene was amplified from the stock pMA vector by PCR 
with Expand High Fidelity polymerase (Roche) utilizing 
primers incorporating cleavage sites for Bamlil at the 5' end 
and Notl at the 3' end preceded by a TAA stop codon 
(forward primer GAGAGTCCTCGGATCCCACAAAACC- 
CTGGATGGCGTGGAA, reverse primer CTCTCTCGCG- 
GCCGCTCTTAGCTGGTTTTGGTGATCGCTT). The PCR 
products were digested with BamHI and Notl (NEB) and 
cloned into pGEX-6P-l using T4 DNA ligase (New England 
Biolabs) to generate pGEX-6P-l-Cwp84 33 ^ 97 C116A. 

The plasmid was transformed into E. coli BL21*(DE3) cells. 
Cultures were grown from glycerol stocks in 5 ml LB 



supplemented with 100 ug mP 1 ampicillin for 17 h and 
centrifuged (5000g, 10 min). The cell pellets were washed with 
water, centrifuged a second time, resuspended in water and 
used to inoculate 500 ml selenomethionine medium (Mole- 
cular Dimensions) supplemented with 100 ug mP 1 ampicillin. 
These cultures were grown with shaking (200 rev min -1 , 37°C) 
to an OD 600 of 0.7. The temperature was reduced to 16°C 
and methionine production was inhibited by the addition of 
100 ug mP 1 lysine, phenylalanine and threonine and 
50 ug mP 1 leucine, isoleucine and valine. 60 ug mP 1 seleno- 
methionine was also added and the cultures were incubated 
for 15 min before expression was induced with 1 mM IPTG 
The cultures were incubated for a further 18 h and harvested 
by centrifugation (8000g, 10 min). 

The cell pellets were resuspended in PBS (140 mM NaCl, 
2.7 mM KC1, 5 mM DTT, 10 mM Na 2 HP0 4 , 1.8 mM KH 2 P0 4 
pH 7.3), lysed in a French press and clarified by centrifugation 
(75 OOOg, 25 min). The supernatant was loaded onto a GSTrap 
column (GE Healthcare) and washed with PBS, and tagged 
protein was eluted with 10 mM glutathione, 50 mM Tris-HCl 
pH 8.0. PreScission protease (80 ul) was added and the eluted 
protein was dialyzed overnight into cleavage buffer (50 mM 
Tris-HCl, 150 mM NaCl, 1 mM EDTA, 5 mM DTT pH 7.5). 
The dialyzed sample was then reloaded onto the GSTrap 
column to separate the unbound protein from the tag. 

Unbound protein was concentrated to a volume of roughly 
1 ml and further purified by size-exclusion chromatography 
(SEC) into 50 mM Tris-HCl pH 8.0 (using a Superdex 200 
16/600 column); fractions containing Cwp84 3 3_49 7 C116A were 
pooled and concentrated to 11.9 mg ml -1 . 

2.2. Trypsin cleavage of Cwp84 

GST-Cwp84 33 ^ 4 c, 7 was incubated with trypsin at a molar 
ratio of approximately 10:1 for 45 min. Following purification 
by SEC in 25 mM MOPS pH 7.0, the resulting single species 
(Cwp84 92 _497) was analysed by electrospray ionization mass 
spectrometry. Cwp84 92-497 was also transferred onto PVDF 
and sent for N-terminal sequencing (AltaBioscience). 

2.3. X-ray crystallographic studies 

Crystallization-condition screening was performed with a 
range of pre-prepared 96-well screens (Molecular Dimen- 
sions) using an Art Robbins Phoenix nanodispensing robot. 
Optimal conditions were reproduced with 0.3 ul drops with a 
1:1 ratio of protein to reservoir solution (0.2 M ammonium 
sulfate, 30% PEG 4K; Molecular Dimensions Structure Screen 
1 & 2, solution D7). Crystals took between 3 d and a week to 
grow. 

X-ray diffraction data were collected at station 103 at 
Diamond Light Source (DLS; Didcot, Oxfordshire, England). 
The diffraction data were recorded with 1.0° oscillation on a 
Pilatus 6M detector from four crystals to obtain maximum 
redundancy. Selenium-fluorescence peak and inflection data 
were collected from all four crystals (to a maximum resolution 
of 1.73-1.87 A), while high and low remote data were 
collected from two crystals (to a maximum resolution of 
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Figure 2 

Multiple sequence alignment of Cwp84 3 3_ 4 97 and the highest unique BLAST results. All are cysteine proteases that possess a putative lectin-like domain. 
The alignment was performed using ClustalWl (Larkin et al, 2007) and rendered with ALINE (Bond & Schiittelkopf, 2009). Strictly conserved residues 
are shown in yellow, medium to well conserved residues are in orange and slightly conserved residues are in blue. The secondary structure of Cwp84, as 
predicted by DSSP (Kabsch & Sander, 1983), is also shown coloured according to Fig. 1. 3 10 -Helices and /3-bridges are displayed in the same way as a- 
;helices and /3-strands, but are not numbered. Active-site residues (GlnllO, Cysll6 and His262) are indicated with pink stars, the propeptide cleavage site 
(Lys91-Ser92) is indicated with a black arrow and the occluding loop and PBL regions are indicated with blue and red triangular brackets, respectively. 
Sequences are taken from the following NCBI GenBank references: Cwp84, NC_009089; Eubacterium CAG:202, CDC03302; Ruminococcus bromii, 
YP_007780613; Eubacterium CAG:581, CDF12829; Clostridium hiranonis, WP_006441026; Peptostreptococcus stomatis, WP_007788460; P. anaerobius, 
WP_002842957; Anaerococcus hydrogenalis, WP_004816163; Methanosarcina mazei, NP_632235. The proteins from C. hiranonis, P. stomatis and 
P. anaerobius possess three putative Pfam 04122 repeats and thus are likely to be S-layer proteins performing similar functions to Cwp84. 
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Table 1 

X-ray crystallographic statistics. 
Values in parentheses are for the outer shell. 



1.94-2.16 A). 1120 peak images were 
collected at 12 660 eV, 1120 inflection 
images at 12 656 eV, 540 low-remote 
images at 12 550 eV and 540 high- 
remote images at 12770 eV. The 
data were automatically indexed and 
integrated with XDS (Kabsch, 2010) 
and xial (Winter et al, 2013), respec- 
tively. The data were scaled (and 
resolutions cut to those reported in 
Table 1 to reduce errors) with SCALA 
(Diederichs & Karplus, 1997), combined 
with CAD (CCP4; Winn et al, 2011) and 
put into the Crank MAD pipeline 
(CCP4; Ness et al, 2004) with a resolu- 
tion cutoff of 2.5 A using SCALEIT 
(Howell & Smith, 1992), AFRO 
(CCP4), CRUNCH! (de Graaff et al, 
2001), BP3 (Pannu et al, 2003; Pannu & 
Read, 2004), SOLOMON (Abrahams & 
Leslie, 1996) and 500 cycles of 
Buccaneer! REFM AC (Cowtan, 2006; 
Murshudov et al, 2011). CRUNCH! 
found 55 potential selenium sites out of 
a predicted 48 within the unit cell, the 
validity of which was determined with 
the later programs, allowing Buccaneer 
and REFM AC to produce an output 
model with a figure of merit of 85.6% 
and i? cryst and i?f ree values of 24.8 and 

27.7%, respectively. The model was 

further refined with CootlREFMAC 

(Emsley & Cowtan, 2004) using a 1.4 A resolution native data 
set collected on a Pilatus 6M on 102 at DLS that had been 
autoprocessed with XDS and xia2 and scaled with AIMLESS 
(Evans, 2006). Secondary structure was determined using 
DSSP (Kabsch & Sander, 1983) and the model was verified 
with MolProbity (Chen et al, 2010). 

The atomic coordinates and structure-factor amplitudes 
have been deposited with the RCSB Protein Data Bank 
(http://www.pdb.org) under PDB accession code 4ci7. 



3. Results 
3.1. Overview 

We have determined the crystal structure of a truncated 
Cwp84 active-site mutant, residues 33^497, which comprises 
the propeptide, the cysteine protease domain and the newly 
identified iectin-like' domain (Fig. 1). This combination of a 
cysteine protease domain and a 'lectin-like' domain appears to 
be present in a number of species within the Clostridiales 
order and is also seen in a small number of archaea (Fig. 2), as 
revealed by a BLASTP search using Cwp84 33 _49 7 from strain 
630, suggesting conservation of this particular domain 
arrangement. DALI searches using the whole structure did 
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not reveal any proteins within the PDB with structural simi- 
larity over both domains. 

The high-resolution structure was solved in the monoclinic 
space group P2 1 to 1.4 A resolution with two molecules in the 
crystallographic asymmetric unit. It was refined to final i? cryst 
and i? free values of 13.8 and 16.9%, respectively, and also 
contained two calcium ions, two sulfate ions, eight PEG 
molecules, six glycerol molecules and 927 water molecules, 
with an estimated solvent content of 43.8%. Calcium ion 
identities were determined by their ability to fill electron 
density and were confirmed through coordinate bond lengths 
(Harding, 2004; Zheng et al, 2008). Overall, 96.1% of the 
residues are in the preferred regions of the Ramachandran 
plot, with 3.9% in the allowed regions and no outliers. The 
crystallographic statistics are summarized in Table 1. Poor 
electron density was observed between residues Gly58 and 
Tyr63, although we were able to interpret this part of the 
structure with a fair degree of certainty; little to no density was 
observed between Lys81 and Tyr89, so this region was not 
built in the structure (Fig. 3a). 



3.2. Propeptide 

The propeptide largely consists of loop regions with a 
central helix (al) and short /S-strand (jii). The poorly defined 
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region was determined to contain a short helix in chain B but 
not in chain A: our secondary-structure numbering assumes 
that this helix is not present. 

The N-terminal portion of the Cwp84 propeptide (His33- 
Gly65) wraps around the lectin-like domain (Figs, lb and lc) 
and does not exhibit similarities to propeptides from other 
papain proteases, which commonly form a small globular 



• 5 # ; 7^r^ %p*5 




domain covering the top of the active site and are stabilized by 
a /3-sheet formed by interaction with the prosegment binding 
loop (PBL; Figs. 4a and 4b). This novel conformation leaves 
the S' end of the active-site groove (the portion of the active- 
site groove that interacts with the peptide substrate after the 
scissile bond, based on the active-site nomenclature of 
proteases; Sajid & McKerrow, 2002; Schechter & Berger, 

1967) significantly more acces- 
sible than in other cysteine 
proteases. Nevertheless, the 



Figure 3 

Cysteine protease propeptide and 
active-site groove, (a) The full length 
of the propeptide from His33 to Lys91 
shown with sticks, ribbon and electron 
density (la, 2F a — F c map). The novel 
fold of the 30 residues is shown at the 
bottom of the image, while the normal 
section within the active-site groove is 
shown at the top of the image. Poor 
density that allowed modelling of 
Gly58-Tyr63 with a fair level of confi- 
dence can be observed on the right, and 
a lack of density for the unmodelled 
section towards the end of the propep- 
tide is shown at the top. (6, c) Mole- 
cular surface of the cysteine protease 
active-site groove containing the 
propeptide; the two images are 50° 
apart. As in Fig. 1(a), the cysteine 
protease domain is shown in green 
and the lectin-like domain is shown in 
cyan; the three active-site residues are 
shown in pink. Propeptide residues 
before Asn64 have been removed for 
clarity. Met73 shows multiple confor- 
mations. Owing to the proximity of the 
side-chain carbonyl of Asnll4 and the 
backbone carbonyl of Asn261 (4.7 A in 
chain A and 4.6 A in chain B), a 
continuous section of surface is shown 
above the active site. The propeptide 
fills the active-site groove and is shown 
in close contact with both domains, (d) 
Active site of Cwp84, with catalytic 
residues, residues involved in the 
formation of the S 2 negatively charged 
pocket and Val66 from the propeptide 
shown. The negatively charged S 2 
pocket is shown surrounded by the 
residues that form it: Ser235, which 
shows multiple conformations, Thr317, 
Asp318 and Asp320. Note that Val66 
does not enter the negatively charged 
pocket, but we propose that the P 2 
lysine of SlpA would. The oxyanion 
hole, formed by GlnllO and Cysll6Ala, 
which stabilizes a catalytic inter- 
mediate, is also visible on the left, (e) 
Occlusion of the active-site residues by 
Asnll4 and Asn261. We propose that 
their proximity to each other is a result 
of interactions with the propeptide and 
assists in the prevention of binding of 
the substrate. Upon removal of the 
propeptide, the distance may be length- 
ened slightly, opening the active site. 



4.7 

J? 

Q110 H262{f. 
J C116A ■ 



1 988 Bradshaw et a/. • Cwp84 



Acta Gyst. (2014). D70, 1983-1993 



research papers 



catalytic residues are partially occluded by Asnll4 and 
Asn261 (Fig. 3e). 

The C-terminal portion of the propeptide (Val66-Arg79) 
forms an extended loop that sits in the active-site cleft. The 
poorly defined helix (found only in chain B) that precedes this 
loop is considerably removed from the active site, around 7- 
8 A away from its location in both cathepsin L and cathepsin B 
(Fig. 4c). Residues Asn64-Ile67 form a hydrogen-bond 
network with the cysteine protease domain. These interactions 
are mainly with Metl60-Serl64, but hydrogen bonds are also 
formed to Asnll4 and Leu260. After this, the propeptide 




enters the active-site groove, with Pro70-Glu72 forming 
hydrogen bonds to the N-terminal part of the propeptide. 
Thr76-Arg79 form a large number of hydrogen bonds to the 
lectin-like domain and the cysteine protease domain. Close 
interactions between the propeptide and the cysteine protease 
domain are seen in many other proteins (Coulombe et al, 
1996; Sivaraman et al, 1999), but as the lectin-like domain is a 
newly observed feature of a cysteine protease, so too are its 
interactions with the propeptide. 

There are usually two main points on a cysteine protease to 
which its propeptide is anchored: the surface-exposed PBL 
(prosegment-binding loop), which the 
propeptide of Cwp84 does not approach, 
and the S 2 subsite of the active-site cleft, 
which is occupied by a residue that mimics 
the substrate (Coulombe et al, 1996; 
Sivaraman et al, 1999). Interestingly, 
in Cwp84 this latter position is occupied by 
Val66 from the propeptide, while the P 2 
residue of SlpA is usually lysine. Although 
Val66 is able to interact with the S 2 subsite 



Figure 4 

Structural comparisons between Cwp84 and other 
cysteine proteases, (a) Comparison of cysteine 
protease propeptides and prosegment binding loops 
(PBLs). Structures are rendered as coils for 
simplicity. Overview of the whole region, showing 
the Cwp84 propeptide in red and cysteine protease 
domain in green, and the cathepsin K (PDB entry 
7pck; cathepsin L-like; Sivaraman et al, 1999) 
propeptide in yellow with the cysteine protease 
domain in blue; Cwp84 active-site residues are 
shown in purple. Active-site residues of Cwp84 are 
shown in magenta and those of cathepsin K are 
shown in black. Both propeptides cover the active- 
site groove, shown on the left. Cathepsin propep- 
tides wrap around the protein, interacting with the 
PBL and forming a conserved helix, while Cwp84 
folds back on itself and wraps around the lectin-like 
domain, leaving the top of the active site consider- 
ably more exposed, (b) Cross-eyed three-dimen- 
sional view of the PBL. The usually conserved 
a-helix and short /3-sheet are not present in Cwp84, 
with the whole chain rotated roughly 90°. A turn or 
short loop below the PBL is replaced by a 16-residue 
loop that occupies the space normally taken up by 
the propeptide, (c) Cross-eyed three-dimensional 
comparison of cysteine protease occluding loop 
regions. Cwp84 is shown in green, cathepsin L (PDB 
entry lcjl; Coulombe et al, 1996) in blue and 
cathepsin B (PDB entrylpbh; Turk et al, 1996) in 
olive. The active-site residues of Cwp84 (GlnllO, 
C116A and His262) are shown in purple, those of 
cathepsin L are shown in black and those of 
cathepsin B in brown. The fold of cathepsin L is 
well conserved; many cathepsin L-like proteases will 
superpose very closely in this region. The relatively 
short loop does not affect interactions with the 
active site. Cathepsin B-like proteases have a 
significantly longer, more variable loop that controls 
substrate specificity and confers carboxypeptidase 
activity. The equivalent loop in Cwp84 is closer to 
that of cathepsin L-like proteases but is slightly 
longer and could be involved in substrate binding. 
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through van der Waals interactions, the shorter, hydrophobic 
side chain does not enter the negatively charged pocket 
(Fig. 3d). Given the apparent lack of PBL stabilization and the 
shorter Val66, the propeptide is likely to be stabilized through 
other multi-domain interactions. 

Treatment of the purified recombinant GST-Cwp84 33 _49 7 
protein (78.5 kDa) with trypsin was found to result in the loss 
of approximately 33.5 kDa, giving a single band of 45 kDa. 
The mass of this protein, as confirmed by mass-spectrometric 
analysis, was 45 058 Da, and therefore the loss of 33.5 kDa 
from the protein is consistent with removal of the proregion 
and GST. The N-terminal sequencing determined that the 
remaining 45 kDa protein had an N-terminus of SSVAY, 
confirming that the proregion up to Ser92 had been removed. 
These data suggest that the proregion is folded in Cwp84 33 _ 49 7 
in such a way that it is accessible for cleavage by trypsin 
and that artificial maturation has replicated the removal of 
the proregion up to Ser92 as observed in C. difficile 
(ChapetonMontes et al, 2011; de la Riva et al, 2011). 

3.3. Cysteine protease domain 

The overall fold of the cysteine protease domain of Cwp84 
is similar to those of other papain proteases, particularly 
cathepsin L-like proteases. A DALI structural similarity 
search (Holm & Rosenstrom, 2010) indicates that it shares the 
highest level of similarity with Toxoplasma gondii cathepsin L 
(Z = 23.9, sequence identity 20%; PDB entry 3f75; Larson et 
ah, 2009), rhodesain from Trypanosoma brucei (Z = 23.6, 
sequence identity 21%; PDB entry 2p7u; Kerr et al, 2009) and 
cruzipain from T. cruzi (Z = 23.5, sequence identity 19%; PDB 
entry 4klb; Wiggers et al, 2013). 




Figure 5 

Calcium ion coordination by the lectin-like domain and two water 
molecules. Nearby hydrogen bonds between the lectin-like domain and 
the cysteine protease domain (two of three sets of charge-based 
interactions between the two domains) are also shown. Domains are 
coloured according to Fig. 1, coordinate bonds are shown in yellow and 
hydrogen bonds are shown in grey. Calcium ion coordination brings 
together distant parts of the primary structure and is likely to be essential 
for correct folding. 



The cysteine protease domain exhibits a typical, approxi- 
mately U-shaped fold with two subdomains flanking the 
central active-site cleft, one formed by a twisted antiparallel 
/8-sheet containing four /3-strands (/34, /36, pTI and /38), one 
helix (a5) and several loop regions, and the other formed by a 
central 15-residue-long a-helix (al) surrounded by two short 
a-helices (a3 and a4), an antiparallel /3-sheet containing two 
strands (/33 and yS9) and several loop regions (Fig. 2). 

The active site of the cysteine protease domain of Cwp84 is 
similar to those of other cysteine proteases with regard to the 
positions of the active-site residues Cysll6 (mutated to 
alanine in the present study), Ffis262 and GlnllO. Asn287, 
which has previously been suggested to be an active-site 
residue (Savariau-Lacomme et al, 2003), is not located within 
the active site. 



3.4. Lectin-like domain 

We have discovered that the approximately 170-residue 
'linker' region between the cysteine protease domain and the 
first cell-wall-binding domain in full-length Cwp84 forms a 
single domain (residues 335-497) consisting of 13 /3-strands 
(/31 0-/322), eight of which form a twisted antiparallel /3-sand- 
wich with a hydrophobic core. Proteins with similar folds to 
this domain were determined using a DALI search. The 
majority of the most similar results were carbohydrate-binding 
proteins, including Clostridium perfringens a-A'-acetylgluco- 
saminidase (Z = 8.1, sequence identity 14%; PDB entry 2vcc; 
Ficko-Blean et al., 2008), a sialidase from Micromonospora 
viridifaciens (Z = 8.0, sequence identity 11%; PDB entry 2bzd; 
Newstead et al, 2005) and a noncatalytic carbohydrate- 
binding module from Clostridium thermocellum (Z = 7.7, 
sequence identity 8%; PDB entry 2yb7; Montanier et al, 
2011); we therefore designate this domain the 'lectin-like' 
domain. There were, however, a significant number of non- 
carbohydrate-binding results, including E3 ubiquitin ligases 
such as Mus musculus MYCBP2 (Z = 9.5, sequence identity = 
13%; PDB entry 3hwj; Sampathkumar et al, 2010), human 
DNA-repair protein XRCC1 (Z = 8.2, sequence identity 10%; 
PDB entry 3k77; Cuneo & London, 2010) and Chlamydo- 
monas reinhardtii intraflagellar transport protein 25 (Z = 8.1, 
sequence identity 9%; PDB entry 2yc4; Bhogaraju et al, 2011). 
The lectin-like domain contains a calcium ion coordinated by 
Leu339, Glu448, Lys460, Asn487 and two water molecules. 
Most of the conserved residues within the lectin-like domain 
are found within /3-strands, are hydrophobic or bind calcium 
(Fig. 2). This indicates that the structure and potentially the 
function of the lectin-like domain is conserved amongst these 
proteins, of which we believe this to be the first report. 

The lectin-like domain contains a hydrophobic core that 
opens at the surface of the protein, producing a hydrophobic 
pocket formed by residues Ile347, Ile468, Ile477 and Phe483. 
Interestingly, both Leu36 and Val39 from the propeptide 
insert into this pocket, with Lys34 hydrogen bonding to 
Thr479, suggesting that these interactions may provide stabi- 
lizing roles through hydrophobic interactions. 
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The cysteine protease domain and the lectin-like domain 
also have interaction points between the two domains at three 
locations: Gln338, Leu457-Glu458 (Fig. 5) and Tyr408- 
Asn413. The glutamine residue at position 338, which is highly 
conserved in the BLASTP results (Fig. 2), forms an isolated 
hydrogen bond; Leu457-Glu458 form main-chain hydrogen 
bonds, while Tyr408-Asn413 make both main-chain and side- 
chain interactions. 

Two of the three regions where the lectin-like and cysteine 
protease domains interact (Gln338 and Leu457-Glu458) are 
both sequentially and spatially close to the calcium ion- 
binding site (formed by Leu339, Glu448, Lys460 and Asn487). 

4. Discussion 

In this study, we have elucidated the structure of residues 
33-497 of Cwp84, the surface-associated cysteine protease of 
C. difficile which plays a key role in the maturation of the 
S-layer protein SlpA. The high-resolution structural data 
presented here will improve the understanding of the role of 
Cwp84 in S-layer biogenesis. In addition, the discovery of a 
newly identified calcium-binding lectin-like (putative carbo- 
hydrate-binding) domain raises exciting possibilities with 
regard to the potential role(s) that this region may have in 
S-layer biogenesis in C. difficile and also in other species, such 
as those presented in Fig. 2. We also compared the structure of 
the cysteine protease domain (CIA family) of Cwp84 with 
those reported for the cysteine protease domains (C80 family) 
from the large clostridial toxins of C. difficile (Ted A and TcdB; 
Pruitt et al, 2009; Shen et al, 2011) and found no detectable 
structural similarity between the two classes of cysteine 
protease structures. 

We observed that the cysteine protease domain retains a 
strong structural similarity to other papain-family enzymes, 
namely the cathepsins, particularly cathepsin L. However, 
significant differences exist between Cwp84 and structurally 
similar proteases. 

Cathepsin B-like proteases possess a long loop, known as 
the occluding loop, which partially blocks the S end of the 
active site. This allows greater endopeptidase substrate 
specificity and also confers carboxypeptidase activity on the 
protein, with a conserved HH motif in the occluding loop 
binding the substrate at the S 2 position (Sajid & McKerrow, 
2002). In the same position, cathepsin L-like proteases possess 
a much shorter loop that does not block the active site, 
allowing the cleavage of a broader range of substrates 
(Coulombe et al, 1996). The equivalent loop in Cwp84 (found 
between a4 and f33) is closer to that of cathepsin L-like 
proteases. Although slightly longer than the usually well 
conserved fold, it is much shorter than the occluding loop 
found in cathepsin B-like proteases and does not contain the 
HH motif (Fig. Ac). This loop is poorly conserved among 
closely related proteins (Fig. 2) and thus may be involved in 
substrate selectivity. 

The loop formed between helix 3 and helix 4, which forms 
one side of the active-site cleft and has a position that is well 
conserved in other cysteine proteases, is roughly 3-4 A further 



away from the active site than usual. This presents a deeper 
active-site cleft, which may be important for substrate binding 
and specificity. This loop also contains two residues that form a 
/S-bridge with the lectin-like domain, forming one of the three 
contact points between the two domains (Fig. 5). The active- 
site cleft then continues in the S direction with one side 
formed by the cysteine protease domain and the other by the 
lectin-like domain, which, as it has not been observed in other 
cysteine protease structures, gives the S end of the active site a 
significantly different shape (Fig. 3). 

Moreover, in papain proteases, a residue above the S 2 
position of the active site has been shown to play a significant 
role in determination of substrate specificity: this position is 
occupied by Ser205 in papain, Ala214 in cathepsin L and 
Glu245 in cathepsin B (Sajid & McKerrow, 2002). In Cwp84, 
S 2 selectivity is likely to be controlled by Asp320, which, along 
with Ser235, Thr317 and Asp318, forms a negatively charged 
pocket which is likely to stabilize the binding of the P 2 lysine 
residue usually found in SlpA (Fig. 3d). Indeed, mutation of 
the P 2 lysine to alanine has been shown to abolish the cleavage 
of an SlpA fragment by Cwp84 in co-expression studies, 
suggesting its significance in SlpA cleavage (Dang et al, 2010). 

We believe the lectin-like domain to be a newly observed 
feature of cysteine proteases, particularly those from Clos- 
tridiales. It bears some resemblance to the jelly-roll domain of 
the clostridial serine protease CspB, in that both are /S-sand- 
wiches that are closely associated with a protease domain 
(Adams et al, 2013). The two could possess similar functions, 
namely conferring resistance to degradation, positioning the 
prodomain for cleavage and assuring the correct conformation 
of the protease domain. Even though the cores of the lectin- 
like domains appear to have a similar structure, there 
are significant changes (resulting in a large root-mean-square 
deviation) in the positioning of the y6-strands, including the 
loop regions. Further experimental studies will be required to 
confirm the role(s) of the lectin-like domain in Cwp84. 

Interestingly, lectin-like interactions have been suggested 
to be involved in S-layer array formation, particularly with 
regard to the linkage between the S-layer subunits and 
secondary cell-wall polymers (SCWPs; Ferner-Ortner et al, 
2007; Sara et al, 1998; Sara & Sleytr, 2000). 

The carbohydrate-binding region seen in many of the DALI 
results does not appear to be present in Cwp84, indicating that 
if the lectin-like domain does bind carbohydrates, it does so 
using a different interface. IFT25 (intraflagellar transport 
protein 25) has a fold almost identical to that of sialidases, but 
the carbohydrate-binding region is replaced by a region that 
interacts with a helix from IFT27 to form the IFT25/27 
complex (Bhogaraju et al, 2011). In Cwp84, the equivalent 
region interacts with the propeptide. If the Cwp84 lectin-like 
domain does bind carbohydrates (or a different cofactor) in 
this region, it is possible that the propeptide prevents binding. 
It is also not unreasonable to assume that despite its similarity 
to carbohydrate-binding proteins, the lectin-like domain of 
Cwp84 may assume a completely different function. 

We believe the close interactions between the cysteine 
protease domain, the lectin-like domain and the propeptide 
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are likely to be essential to the initial folding of the protein 
and will mediate substrate binding and specificity. 

5. Conclusions 

We have determined the structure of the Cwp84 cysteine 
protease domain with its bound propeptide and a newly 
discovered lectin-like domain. The propeptide sits in the 
active-site groove and wraps itself around the lectin-like 
domain, closely interacting with both domains, a feature that is 
likely to be important in the initial folding of the protein. The 
cysteine protease domain, although similar to many previously 
determined cathepsin L-like structures, bears significant 
differences; namely, the active-site groove is deepened by the 
lectin-like domain, the PBL is not present and the would-be 
occluding loop is slightly longer. The lectin-like domain bears 
a similar y6-sandwich fold to that seen in many carbohydrate- 
binding proteins, but it is currently unclear what function it 
possesses. If it does bind a carbohydrate, it is possible that the 
lectin-like domain may be involved in substrate recognition or 
attachment to the cell wall, resulting in correct orientation of 
the cysteine protease domain for cleavage of SlpA. 

Further structural and functional studies are necessary to 
elucidate the exact mechanism of Cwp84-mediated SlpA 
cleavage and how this contributes to overall S-layer 
biosynthesis. Given the likely key role of the C. difficile surface 
in growth and colonization, the potential development of anti- 
colonization inhibitors or vaccines is significantly aided by 
structural data such as that presented here. 
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